research assistant
ToPolyAgent: AI Agents for Coarse-Grained Topological Polymer Simulations
Ding, Lijie, Carrillo, Jan-Michael, Do, Changwoo
We introduce ToPolyAgent, a multi-agent AI framework for performing coarse-grained molecular dynamics (MD) simulations of topological polymers through natural language instructions. By integrating large language models (LLMs) with domain-specific computational tools, ToPolyAgent supports both interactive and autonomous simulation workflows across diverse polymer architectures, including linear, ring, brush, and star polymers, as well as dendrimers. The system consists of four LLM-powered agents: a Config Agent for generating initial polymer-solvent configurations, a Simulation Agent for executing LAMMPS-based MD simulations and conformational analyses, a Report Agent for compiling markdown reports, and a Workflow Agent for streamlined autonomous operations. Interactive mode incorporates user feedback loops for iterative refinements, while autonomous mode enables end-to-end task execution from detailed prompts. We demonstrate ToPolyAgent's versatility through case studies involving diverse polymer architectures under varying solvent condition, thermostats, and simulation lengths. Furthermore, we highlight its potential as a research assistant by directing it to investigate the effect of interaction parameters on the linear polymer conformation, and the influence of grafting density on the persistence length of the brush polymer. By coupling natural language interfaces with rigorous simulation tools, ToPolyAgent lowers barriers to complex computational workflows and advances AI-driven materials discovery in polymer science. It lays the foundation for autonomous and extensible multi-agent scientific research ecosystems.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Research Report (1.00)
- Workflow (0.90)
- Energy (0.68)
- Government > Regional Government > North America Government > United States Government (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
Pan, Haining, Roggeveen, James V., Berg, Erez, Carrasquilla, Juan, Chowdhury, Debanjan, Ganguli, Surya, Ghimenti, Federico, Hasik, Juraj, Hunt, Henry, Jiang, Hong-Chen, Kamb, Mason, Kao, Ying-Jer, Khatami, Ehsan, Lawler, Michael J., Luo, Di, Neupert, Titus, Qi, Xiaoliang, Brenner, Michael P., Kim, Eun-Ah
Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter theory (CMT) at the level of an expert researcher. Topics span analytical and computational approaches in quantum many-body, and classical statistical mechanics. The dataset was designed and verified by a panel of expert researchers from around the world. We built the dataset through a collaborative environment that challenges the panel to write and refine problems they would want a research assistant to solve, including Hartree-Fock, exact diagonalization, quantum/variational Monte Carlo, density matrix renormalization group (DMRG), quantum/classical statistical mechanics, and model building. We evaluate LLMs by programmatically checking solutions against expert-supplied ground truth. We developed machine-grading, including symbolic handling of non-commuting operators via normal ordering. They generalize across tasks too. Our evaluations show that frontier models struggle with all of the problems in the dataset, highlighting a gap in the physical reasoning skills of current LLMs. Notably, experts identified strategies for creating increasingly difficult problems by interacting with the LLMs and exploiting common failure modes. The best model, GPT5, solves 30\% of the problems; average across 17 models (GPT, Gemini, Claude, DeepSeek, Llama) is 11.4$\pm$2.1\%. Moreover, 18 problems are solved by none of the 17 models, and 26 by at most one. These unsolved problems span Quantum Monte Carlo, Variational Monte Carlo, and DMRG. Answers sometimes violate fundamental symmetries or have unphysical scaling dimensions. We believe this benchmark will guide development toward capable AI research assistants and tutors.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Taiwan (0.04)
- (2 more...)
- Education (1.00)
- Energy (0.68)
- Government > Regional Government (0.46)
Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval
Bhavsar, Vivek, Ereifej, Joseph, Gurusami, Aravanan
Large language models accelerate literature synthesis but can hallucinate and mis-cite, limiting their usefulness in expert workflows. We present RA-FSM (Research Assistant - Finite State Machine), a modular GPT-based research assistant that wraps generation in a finite-state control loop: Relevance -> Confidence -> Knowledge. The system is grounded in vector retrieval and a deterministic citation pipeline. The controller filters out-of-scope queries, scores answerability, decomposes questions, and triggers retrieval only when needed, and emits answers with confidence labels and in-corpus, de-duplicated references. A ranked-tier ingestion workflow constructs a domain knowledge base from journals, conferences, indices, preprints, and patents, writing both to a dense vector index and to a relational store of normalized metrics. We implement the system for photonics and evaluate it on six task categories: analytical reasoning, numerical analysis, methodological critique, comparative synthesis, factual extraction, and application design. In blinded A/B reviews, domain experts prefer RA-FSM to both a strong Notebook LM (NLM) and a vanilla Default GPT API call single-pass baseline, citing stronger boundary-condition handling and more defensible evidence use. Coverage and novelty analyses indicate that RA-FSM explores beyond the NLM while incurring tunable latency and cost overheads. The design emphasizes transparent, well-cited answers for high-stakes technical work and is generalizable to other scientific domains.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Workflow (0.86)
- Research Report > New Finding (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012
Breuer, Adam, Dietrich, Bryce J., Crespin, Michael H., Butler, Matthew, Pyrse, J. A., Imai, Kosuke
This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation led many to rely on smaller subsets. We design a large-scale parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.
- North America > United States > Arkansas (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Wisconsin (0.04)
- (15 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Marketing (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Cappello, Franck, Madireddy, Sandeep, Underwood, Robert, Getty, Neil, Chia, Nicholas Lee-Ping, Ramachandra, Nesar, Nguyen, Josh, Keceli, Murat, Mallick, Tanwi, Li, Zilinghan, Ngom, Marieme, Zhang, Chenhui, Yanguas-Gil, Angel, Antoniuk, Evan, Kailkhura, Bhavya, Tian, Minyang, Du, Yufeng, Ting, Yuan-Sen, Wells, Azton, Nicolae, Bogdan, Maurya, Avinash, Rafique, M. Mustafa, Huerta, Eliu, Li, Bo, Foster, Ian, Stevens, Rick
Recent advancements have positioned AI, and particularly Large Language Models (LLMs), as transformative tools for scientific research, capable of addressing complex tasks that require reasoning, problem-solving, and decision-making. Their exceptional capabilities suggest their potential as scientific research assistants but also highlight the need for holistic, rigorous, and domain-specific evaluation to assess effectiveness in real-world scientific applications. This paper describes a multifaceted methodology for Evaluating AI models as scientific Research Assistants (EAIRA) developed at Argonne National Laboratory. This methodology incorporates four primary classes of evaluations. 1) Multiple Choice Questions to assess factual recall; 2) Open Response to evaluate advanced reasoning and problem-solving skills; 3) Lab-Style Experiments involving detailed analysis of capabilities as research assistants in controlled environments; and 4) Field-Style Experiments to capture researcher-LLM interactions at scale in a wide range of scientific domains and applications. These complementary methods enable a comprehensive analysis of LLM strengths and weaknesses with respect to their scientific knowledge, reasoning abilities, and adaptability. Recognizing the rapid pace of LLM advancements, we designed the methodology to evolve and adapt so as to ensure its continued relevance and applicability. This paper describes the methodology state at the end of February 2025. Although developed within a subset of scientific domains, the methodology is designed to be generalizable to a wide range of scientific domains.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- (7 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine (1.00)
- Energy (1.00)
- Education (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
Agent Laboratory: Using LLM Agents as Research Assistants
Schmidgall, Samuel, Su, Yusheng, Wang, Ze, Sun, Ximeng, Wu, Jialian, Yu, Xiaodong, Liu, Jiang, Liu, Zicheng, Barsoum, Emad
Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research quality, we introduce Agent Laboratory, an autonomous LLM-based framework capable of completing the entire research process. This framework accepts a human-provided research idea and progresses through three stages--literature review, experimentation, and report writing to produce comprehensive research outputs, including a code repository and a research report, while enabling users to provide feedback and guidance at each stage. We deploy Agent Laboratory with various state-of-the-art LLMs and invite multiple researchers to assess its quality by participating in a survey, providing human feedback to guide the research process, and then evaluate the final paper. We found that: (1) Agent Laboratory driven by o1-preview generates the best research outcomes; (2) The generated machine learning code is able to achieve state-of-the-art performance compared to existing methods; (3) Human involvement, providing feedback at each stage, significantly improves the overall quality of research; (4) Agent Laboratory significantly reduces research expenses, achieving an 84% decrease compared to previous autonomous research methods. We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery.
The Social Impact of Generative LLM-Based AI
The research was partially supported by the Paul and Marcia Wythes Center on Contemporary China and Office of Population Research at Princeton University. We are grateful to Wen Liu, Gou Wu, and Dean Minello for their excellent research assistance. The ideas expressed herein are those of the authors. Abstract Liking it or not, ready or not, we are likely to enter a new phase of human history in which Artificial Intelligence (AI) will dominate economic production and social life - the AI Revolution. Before the actual arrival of the AI Revolution, it is time for us to speculate on how AI will impact the social world. In this article, we focus on the social impact of generative LLMbased AI (GELLMAI), discussing societal factors that contribute to its technological development and its potential roles in enhancing both between-country and within-country social inequality. There are good indications that the US and China will lead the field and will be the main competitors for domination of AI in the world. We conjecture the AI Revolution will likely give rise to a post-knowledge society in which knowledge per se will become less important than in today's world. Instead, individual relationships and social identity will become more important. With the advent of Generative Large Language Model (LLM)-based Artificial Intelligence (AI) tools such as ChatGPT from OpenAI and Bard from Google, it is natural to wonder about the social impact of this technology. In the remainder of this paper, we will refer to generative LLMbased AI simply as GELLMAI. The main objective of this paper is to explore, tentatively, the social impact of GELLMAI. While the question about the social impact of GELLMAI is undoubtedly important, any answers must be tentative and speculative at this point. We are still in the early stages of GELLMAI and may need to wait years, perhaps even decades, to fully understand its social implications. However, drawing from our experiences with past technologies in history, our current understanding of GELLMAI, empirical knowledge about the social world, and sociological reasoning, we can engage in preliminary and speculative discussions. We offer our account below. We believe that the social impact of GELLMAI is enormous, with the potential to revolutionize not only the production of goods and services but also to fundamentally alter the organization of human societies and the nature of daily life.
- Asia > China (0.74)
- North America > United States > California (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (10 more...)
- Social Sector (1.00)
- Law > Statutes (1.00)
- Information Technology > Security & Privacy (1.00)
- (6 more...)
We're Entering Uncharted Territory for Math
Terence Tao, a mathematics professor at UCLA, is a real-life superintelligence. The "Mozart of Math," as he is sometimes called, is widely considered the world's greatest living mathematician. He has won numerous awards, including the equivalent of a Nobel Prize for mathematics, for his advances and proofs. Right now, AI is nowhere close to his level. But technology companies are trying to get it there.
Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline
Jia, Mengshuo, Cui, Zeyu, Hug, Gabriela
The integration of experiment technologies with large language models (LLMs) is transforming scientific research, offering AI capabilities beyond specialized problem-solving to becoming research assistants for human scientists. In power systems, simulations are essential for research. However, LLMs face significant challenges in power system simulations due to limited pre-existing knowledge and the complexity of power grids. To address this issue, this work proposes a modular framework that integrates expertise from both the power system and LLM domains. This framework enhances LLMs' ability to perform power system simulations on previously unseen tools. Validated using 34 simulation tasks in Daline, a (optimal) power flow simulation and linearization toolbox not yet exposed to LLMs, the proposed framework improved GPT-4o's simulation coding accuracy from 0% to 96.07%, also outperforming the ChatGPT-4o web interface's 33.8% accuracy (with the entire knowledge base uploaded). These results highlight the potential of LLMs as research assistants in power systems.
- Machinery > Industrial Machinery (1.00)
- Energy > Power Industry (1.00)
A FAIR and Free Prompt-based Research Assistant
Shamsabadi, Mahsa, D'Souza, Jennifer
This demo will present the Research Assistant (RA) tool developed to assist with six main types of research tasks defined as standardized instruction templates, instantiated with user input, applied finally as prompts to well-known--for their sophisticated natural language processing abilities--AI tools, such as ChatGPT (https://chat.openai.com/) and Gemini (https://gemini.google.com/app). The six research tasks addressed by RA are: creating FAIR research comparisons, ideating research topics, drafting grant applications, writing scientific blogs, aiding preliminary peer reviews, and formulating enhanced literature search queries. RA's reliance on generative AI tools like ChatGPT or Gemini means the same research task assistance can be offered in any scientific discipline. We demonstrate its versatility by sharing RA outputs in Computer Science, Virology, and Climate Science, where the output with the RA tool assistance mirrored that from a domain expert who performed the same research task.
- Research Report (0.52)
- Workflow (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)